Decentralized Multi-Agent Linear Bandits with Safety Constraints
نویسندگان
چکیده
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to efficiently solve bandit-optimization problem over d-dimensional space. For this problem, we propose DLUCB: fully algorithm that minimizes the cumulative regret entire network. At each round agent chooses its actions following an upper confidence bound (UCB) strategy and share information with their immediate neighbors through carefully designed consensus procedure repeats cycles. Our analysis adjusts duration these communication cycles ensuring near-optimal performance O(d \log{NT}\sqrt{NT}) at rate O(dN^2) per round. The structure affects via small additive term – coined delay depends on spectral gap underlying graph. Notably, our results apply arbitrary topologies without requirement for dedicated acting as server. In consideration situations high cost, RC-DLUCB: modification DLUCB rare among agents. new trades off significantly reduced total cost O(d^3N^5/2) all T rounds. Finally, show ideas extend naturally emerging, albeit more challenging, setting safe bandits. recently studied bandits unknown safety constraints, first algorithm. contributes towards applying bandit techniques in safety-critical distributed systems repeatedly deal environments. present numerical simulations various corroborate theoretical findings.
منابع مشابه
Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm and hence collect a reward, or to broadcast the reward it obtained in the previous epoch to the team a...
متن کاملDecentralized Abstractions For Multi-Agent Systems Under Coupled Constraints
The goal of this report is to define abstractions for multi-agent systems with feedback interconnection in their dynamics. In the proposed decentralized framework, we specify a finite or countable transition system for each agent which only takes into account the discrete positions of its neighbors. The dynamics of the considered systems consist of two components. An appropriate feedback law wh...
متن کاملLinear Contextual Bandits with Global Constraints and Objective
We consider the linear contextual bandit problem with global convex constraints and a concaveobjective function. In each round, the outcome of pulling an arm is a vector, that depends linearly onthe context of that arm. The global constraints require the average of these vectors to lie in a certainconvex set. The objective is a concave function of this average vector. This probl...
متن کاملDecentralized Multi-Agent Navigation Planning with Braids
We present a novel planning framework for navigation in dynamic, multi-agent environments with no explicit communication among agents, such as pedestrian scenes. Inspired by the collaborative nature of human navigation, our approach treats the problem as a coordination game, in which players coordinate to avoid each other as they move towards their destinations. We explicitly encode the concept...
متن کاملNavigation Function Based Decentralized Control of A Multi-Agent System with Network Connectivity Constraints
A wide range of applications require or can benefit from collaborative behavior of a group of agents. The technical challenge addressed in this chapter is the development of a decentralized control strategy that enables each agent to independently navigate to ensure agents achieve a collective goal while maintaining network connectivity. Specifically, cooperative controllers are developed for n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i8.16820